Reversible Machine Translation: What To Do When The Languages Don't Line Up

نویسندگان

  • James Barnett
  • Inderjeet Mani
  • Paul Martin
  • Elaine A. Rich
چکیده

In this paper ,we deal with issues that face an interlingua-based, reversible machine translation system when the~ literal meaning of the source text is not identical to the literal meaning of the natural target translation. We present an algorithm for lexical choice that handles such cases and that relies exclusively, on reversible, monolingual linguistic descriptions and a language-independent domain knowledge base. 1 I n t r o d u c t i o n Machine translation is an obvious application for reversible natural language systems, since both understanding and generation are important parts of the process. There are several arguments for this view (for e:kample, [Isabelle, 89]), including reducing the total cost of adding a new language and making it easier to maintain and validate the resulting system'. Reversible MT systems, just like the broader class of MT systems as a whole, fall into two roughly defined families: transfer systems and interlingua (or pivot systems). Reversible transfer systems (e.g., [van Noord, 90], [Zajac, 90], [Dymetman, 8811 and [Strzalkowski, 90]) exploit three reversible subsystems: one to analyze the source text, one! to perform the transfer, and a third to generate the target text. Interlinguabased systems (e.g., Ultra [Farwell, 90]), on the other hand, require only two reversible components: one to analyze the source text into the interlingua representation, and one to generate the target text f r om that representation. In this paper, we will focus on issues that arise in the design of interlingua-based MT systems. The simplest model of a reversible, interlinguabased system contains two components: one analyzes the source text to create the interlingua representation and the other maps from that to the target text. Unfortunately, the real situation is 61 not that simple, for several reasons, including two that we will focus on here: • This model assumes that the same information is present in the target text as in the source. But in some eases, which have been called translation mismatches [Kameyama, 91], information is either added to or deleted from the source in creating the target. We will show some examples of this below in Section 2. In these eases, the simple reversible system we outlined above would produce unacceptable translations. • Although the notion of a reversible system that describes the set of legal translations is reasonably clearcut, the notion of preferred translation i s more difficult to define [van Noord, 90], [Barnett, 91d]. In some cases, which have been called translation divergences [Dorr, 90], the most natural translation differs from the source in some significant way (e.g., its focus). Of course, in many cases, both of these issues occur together and interact. In this paper, we present some techniques for dealing with these problems. These techniques have three important properties: They require purely declarative, reversible descriptions of the languages that are involved. They require only monolingual facts. Thus new languages can be added to the system without any changes to the descriptions of any other languages. And they are stated in a way that enables their performance to increase gradually along with the power of the underlying knowledge base. 2 T r a n s l a t i o n D i v e r g e n c e s a n d M i s m a t c h e s In this section, we examine some examples in which the source and target languages do not line up. Then, in the rest of the paper, we will outline our solution to these problems. 1. English: "The clogs were running down the street." J a p a n e s e : "inu ga toori-o hashitte-ita."(lit . "dog run (along) the street.") In English, noun phrases must be marked for number. In the natural Japanese translation, number information is absent. 2. Eng l i sh : "I saw a fish in the water." S p a n i s h : "Vi un pez en el agua." English: "I ate a fish." S p a n i s h : "Comi un pescado." Spanish makes a distinction between a fish in its natural state ("pez") and a fish that has been caught for food ("pescado"). "Pez" is also the default form in case it is not clear or does not mat te r what state the fish is in. But it cannot be used if it is clear that the fish has been caught. To get the translation right, it is necessary to infer extra information about the fish, using other knowledge that is available either from the rest of the sentence or from the larger discourse context. Similarly, to reverse the process and go from Spanish to English, it is necessary, in the case of "pescado", to throw away information lest we produce the unnatural translation, "I ate a caught fish." It is important to note, though, that this information cannot be thrown away during understanding, since it would be impor tant if we were translating into another language that made the same distinction. It must be preserved until the point at which generation into the target language takes place. 3. English: "I know him." S p a n i s h : "Lo Conozco." Eng l i sh : "I know the answer." S p a n i s h : "Se la respuesta." Here the issue is the correct translation between English "know" and the two Spanish verbs "conocer" (to be acquainted with someone) and "saber" (to know a fact). This example is similar to the previous one except that here there is no default form. Spanish does not have a word that includes these two different events. . English: "I have a headache." J a p a n e s e : "Atama ga itai." (literally, "my head hurts") Here the problem is more difficult. No longer is it an issue of a single lexical i tem for which there is not an exact match in the target language. Instead, the texts in the two languages differ at the level of an entire phrase, with each language choosing a phrase that describes the situation from a different point of view. In English, we seem to describe an object, "a headache", while Japanese describes the state of a head hurting. The examples that we have just discussed illustrate three different categories of semantic differences between languages: Mismatches caused by semantically significant differences in morphology and syntax, e.g., Example 1. Other common examples involve the presence or absence of markings for gender, number, tense, aspect, and level of politeness. Mismatches caused by lexical differences, where one language has a word that the other lacks, e.g., Examples 2 and 3. Divergences, in which the two languages describe the same state of the world in different ways, as in Example 4. In some of these cases, identical information is conveyed (in the sense that the semantic interpretation of the source implies tha t of the target and vice versa), but in some cases (and depending on the particular model of the world that is being used to define implication) the semantic content of the two forms will not be identical, so many cases of divergence also contain mismatches. 62 Mismatches and divergences are typically viewed as translation (transfer) problems. But in an interlingua-based system it becomes clear that they are primarily problems for generation. The source language analyzer produces an interlingua representation, which the target generator must render into the target language. In cases of mismatch or divergence, doing this requires manipulating the interlingua expression itself since it does not already correspond exactly to the structure of the target string that should be produced. But actually, the fact that the expressions in the interlingua representation came from linguistic expressions in a source language as opposed to from some other source (for example, the output of a problem-solving system) is irrelevant except for a few special caseh in which the form of the source language expreshions can provide help in making generation decis~ions. So, in the rest of this paper, we will present r a generation-centered t reatment of mismatches that relies entirely on reversible, monolingual descriptions of the two languages. 3 The K B N L M T S y s t e m Figure 1 shows a schematic description of the MT system that we :are building. All of the representatat ions in the ifigure, except the source and target language str'ings, are described in terms that are drawn from'~a knowledge base (KB) that describes the domain(s) of discourse. In addition to providing a common set of terms that enable meanings to be:defined, this backend knowledge base is important because it provides the ability to reason about imeanings and thus the ability to add to the target text information that was omitted from the source. We will assume that all the KB-based representations can be treated as sets of logical assertions (although they can of course be implemented in a variety of ways, including the frame-based system [Crawford, 90] that we are using). SOURCE LANGUAGE STRING understand~

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Translation Technology Tools and Professional Translators’ Attitudes toward Them

Today technology is an integral part of professional translation; and it is generally assumed that translators’ attitudes toward translation technology tools influence their interaction with technology (Bundgaard, 2017). Therefore, the present two-phase study seeks to shed some light on what translation technology tools are and how professional translators feel toward them. The research method ...

متن کامل

What Do Voices Say in The Garden Party? An Analysis of Voices in the Persian Translation of Mansfield's Short Story

This study aims at investigating voices in the Persian translation of Katherine Mansfield's The Garden Party. In so doing, after a stylistic analysis of the voices in the original is done, it is argued by the authors that the polyphonous nature of the story is to a great extent due to the deployment of various sociolects in the story as well as the choice of Free Indirect Discourse (FID) as the...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

‘Minor’ Languages, ‘Broken’ Translations: On Brazilian Reworkings of an Albanian Novel

This essay approaches the challenges of global translation in the 21st century from what might still be considered a somewhat uncommon example: a direct translation of Ismail Kadaré's 1978 novel Prill e thyër (Broken April) from the original Albanian into Brazilian Portuguese in 2001. Not only does it examine and compare lexical elements in the source and target texts and the usage of translato...

متن کامل

Don't Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation

We introduce a reinforcement learningbased approach to simultaneous machine translation—producing a translation while receiving input words— between languages with drastically different word orders: from verb-final languages (e.g., German) to verb-medial languages (English). In traditional machine translation, a translator must “wait” for source material to appear before translation begins. We ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1991